Distinguishing Topic from Genre

نویسندگان

  • Benno Stein
  • Sven Meyer
چکیده

This paper contributes to a facet from the area of Web Information Retrieval that has recently received much attention: The satisfaction of a user’s personal information need with respect to text type, presentation type, or information quality. We imply that such properties can be quantified for all kinds of Web documents, and we subsume them under the term “Web genre” or “genre”. Recent surveys show that there is, to a certain degree, a common understanding of Web genre. However, the strictness by which genre and non-genre aspects of a document are experienced is an individual matter. To get a better understanding of the challenges of Web genre identification and its possible limits we investigate in this paper a very interesting question, which has not been posed by now: Given a categorization C of documents (or bookmarks, links, document identifiers), can we provide a reliable assessment whether C is governed by topic or by genre considerations? We present instruments to answer this question as well as to make a distinct statement about the homogeneity of a categorization.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Thesis Stereotyping the Web: Genre Classification of Web Documents

OF THESIS STEREOTYPING THE WEB: GENRE CLASSIFICATION OF WEB DOCUMENTS Retrieving relevant documents over the Web is a difficult task. Currently, search engines rely on keywords for matching documents to user queries. This paper explores the potential for discriminating documents based on the genre of the document. I define genre as a taxonomy that incorporates the style, form and content of a d...

متن کامل

One Sense per Collocation and Genre/Topic Variations

This paper revisits the one sense per collocation hypothesis using fine-grained sense distinctions and two different corpora. We show that the hypothesis is weaker for fine-grained sense distinctions (70% vs. 99% reported earlier on 2-way ambiguities). We also show that one sense per collocation does hold across corpora, but that collocations vary from one corpus to the other, following genre a...

متن کامل

Squibs: Stable Classification of Text Genres

Every text has at least one topic and at least one genre. Evidence for a text’s topic and genre comes, in part, from its lexical and syntactic features—features used in both Automatic Topic Classification and Automatic Genre Classification (AGC). Because an ideal AGC system should be stable in the face of changes in topic distribution, we assess five previously published AGC methods with respec...

متن کامل

Thematization Strategies in the Generic Moves of Research Article Introductions

Despite the heterogeneity of ideas regarding the definitions of genre, there are also common instances shared among scholars interested in particular aspects of the notion. Swales (1990) and Bhatia (1993) are primarily interested in the sociological and psychological aspects of genre's functioning and construction, respectively.. Swales analyzes the genre of 'article introduction', into four ge...

متن کامل

What's in a Domain? Analyzing Genre and Topic Differences in Statistical Machine Translation

Domain adaptation is an active field of research in statistical machine translation (SMT), but so far most work has ignored the distinction between the topic and genre of documents. In this paper we quantify and disentangle the impact of genre and topic differences on translation quality by introducing a new data set that has controlled topic and genre distributions. In addition, we perform a d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008